document processing
Arctic-Extract Technical Report
Chiliński, Mateusz, Ołtusek, Julita, Jaśkowski, Wojciech
Arctic-Extract is a state-of-the-art model designed for extracting structural data (question answering, entities and tables) from scanned or digital-born business documents. Despite its SoTA capabilities, the model is deployable on resource-constrained hardware, weighting only 6.6 GiB, making it suitable for deployment on devices with limited resources, such as A10 GPUs with 24 GB of memory. Arctic-Extract can process up to 125 A4 pages on those GPUs, making suitable for long document processing. This paper highlights Arctic-Extract's training protocols and evaluation results, demonstrating its strong performance in document understanding.
- North America > United States (0.68)
- South America > Chile (0.04)
- Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
- Law (0.69)
- Leisure & Entertainment (0.46)
- Information Technology (0.46)
TalentMine: LLM-Based Extraction and Question-Answering from Multimodal Talent Tables
Mannam, Varun, Wang, Fang, Liu, Chaochun, Chen, Xin
In talent management systems, critical information often resides in complex tabular formats, presenting significant retrieval challenges for conventional language models. These challenges are pronounced when processing Talent documentation that requires precise interpretation of tabular relationships for accurate information retrieval and downstream decision-making. Current table extraction methods struggle with semantic understanding, resulting in poor performance when integrated into retrieval-augmented chat applications. This paper identifies a key bottleneck - while structural table information can be extracted, the semantic relationships between tabular elements are lost, causing downstream query failures. To address this, we introduce TalentMine, a novel LLM-enhanced framework that transforms extracted tables into semantically enriched representations. Unlike conventional approaches relying on CSV or text linearization, our method employs specialized multimodal reasoning to preserve both structural and semantic dimensions of tabular data. Experimental evaluation across employee benefits document collections demonstrates TalentMine's superior performance, achieving 100% accuracy in query answering tasks compared to 0% for standard AWS Textract extraction and 40% for AWS Textract Visual Q&A capabilities. Our comparative analysis also reveals that the Claude v3 Haiku model achieves optimal performance for talent management applications. The key contributions of this work include (1) a systematic analysis of semantic information loss in current table extraction pipelines, (2) a novel LLM-based method for semantically enriched table representation, (3) an efficient integration framework for retrieval-augmented systems as end-to-end systems, and (4) comprehensive benchmarks on talent analytics tasks showing substantial improvements across multiple categories.
- Information Technology > Security & Privacy (0.68)
- Information Technology > Services (0.68)
- Law > Labor & Employment Law (0.49)
- Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Unchecked and Overlooked: Addressing the Checkbox Blind Spot in Large Language Models with CheckboxQA
Turski, Michał, Chiliński, Mateusz, Borchmann, Łukasz
Checkboxes are critical in real-world document processing where the presence or absence of ticks directly informs data extraction and decision-making processes. Yet, despite the strong performance of Large Vision and Language Models across a wide range of tasks, they struggle with interpreting checkable content. This challenge becomes particularly pressing in industries where a single overlooked checkbox may lead to costly regulatory or contractual oversights. To address this gap, we introduce the CheckboxQA dataset, a targeted resource designed to evaluate and improve model performance on checkbox-related tasks. It reveals the limitations of current models and serves as a valuable tool for advancing document comprehension systems, with significant implications for applications in sectors such as legal tech and finance. The dataset is publicly available at: https://github.com/Snowflake-Labs/CheckboxQA
- North America > United States > Nevada > Clark County > Las Vegas (0.04)
- North America > United States > Massachusetts (0.04)
How to be recession ready with intelligent automation
Businesses of all sizes are bracing for a recession. Still, while it may sound counterintuitive, this is actually the right time to accelerate digital transformation. Historically, an economic downturn is a boon for innovation. According to Morgan Stanley, roughly half of Fortune 500 companies were founded in times of recession or economic crisis. Investing in digital transformation will help businesses overcome a slowdown and address talent shortages.
The Next ChatGPT Revolution: Intelligent Document Processing
ChatGPT, the state-of-the-art language model developed by OpenAI, is poised to have a significant impact on the B2B industry. This powerful technology has the potential to disrupt traditional business processes and open up new opportunities for companies across a wide range of industries when it comes to intelligent document processing. One of the key areas where ChatGPT is likely to have an impact is in automating routine tasks and customer interactions. Another area where ChatGPT is likely to be disruptive is in the generation of written content. This technology can be used to quickly and accurately generate reports, product descriptions, and other written materials.
An Introduction to Microsoft Syntex
Despite a global rush toward enterprise digital transformation, the document remains at the heart of most businesses, and unfortunately, managing them still remains a distinctly manual process. Despite its structured nature, the flexibility of a document makes it hard to automate business processes, and taking data from multiple line-of-business applications to insert it in a document is a matter of cut-and-paste, from screen to document and often back again once a document is received. Launched at Ignite in October 2022, Microsoft Syntex is here to solve some of these tediously manual issues, adding document processing tools to SharePoint. The solution uses machine learning to help construct and parse documents, turning a manual process into one where humans guide and check software, and where legal, regulatory and contractual requirements are still met. In this in-depth look at Syntex, learn more about content AI and some of the current use cases for this release.
- Information Technology (0.68)
- Law (0.50)
AI presents opportunity to show customers that insurance 'values' their data
Artificial intelligence (AI), machine learning and digitalisation are key words that those tasked with improving business processes in the insurance sector will be well aware of – but these tools also present the opportunity to show end customers that the insurance industry values them. Read: Amazon's online insurance store – what does it mean for the industry? Read: There will be'winners and losers' in insurance now more than ever – Guidewire This was according to Andy Fairchild, advisor and non-executive director at various insurance-related firms and owner of consultancy Julyfourth Services. Speaking during an Insurance Times webinar entitled AI: A driving force for the future of insurance yesterday (24 November 2022), in association with Inawisdom, Fairchild explained: "[AI] is how we can show customers that we value collecting their data more than we currently do. "We must, as an industry, show customers how important that data is and how important data collection for the provision of an insurance product is." AI processing of customer data and the use of AI-enabled chatbots to respond to customer queries would improve the customer experience by speeding up often slow customer journeys, said Fairchild. But the collection of data behind these operations has to be improved too. Fairchild continued: "We can get that customer data from a person-to-person interaction or – increasingly – from a person-to-machine interaction and therein lies a big move for the industry." Fairchild added that the better collection and deployment of data to construct AI models could transform customers' interactions with the insurance sector from a "trudge process" into something that "they really value". AI and machine learning also have the potential to "revolutionise" the insurance sector in terms of risk selection and pricing if data collection improves, Fairchild added. Read: Brokers embrace cloud technologies to'maintain competitive edge' He explained: "The fundamentals of our industry are risk, risk selection, the terms that we underwrite that risk selection on and the price that we put on it." However, Sameer Deshpande, head of enterprise architecture at broker PIB Group, said that the insurance sector was lagging behind other areas of financial services in its use of artificial intelligence. Deshpande explained: "There are a number of areas where [insurance is] still behind the curve – [for example,] manual processes and document processing.
Putting artificial intelligence and machine learning workloads in the cloud
Artificial intelligence (AI) and machine learning (ML) are some of the most hyped enterprise technologies and have caught the imagination of boards, with the promise of efficiencies and lower costs, and the public, with developments such as self-driving cars and autonomous quadcopter air taxis. Of course, the reality is rather more prosaic, with firms looking to AI to automate areas such as online product recommendations or spotting defects on production lines. Organisations are using AI in vertical industries, such as financial services, retail and energy, where applications include fraud prevention and analysing business performance for loans, demand prediction for seasonal products and crunching through vast amounts of data to optimise energy grids. All this falls short of the idea of AI as an intelligent machine along the lines of 2001: A Space Odyssey's HAL. But it is still a fast-growing market, driven by businesses trying to drive more value from their data, and automate business intelligence and analytics to improve decision-making. Industry analyst firm Gartner, for example, predicts that the global market for AI software will reach US$62bn this year, with the fastest growth coming from knowledge management.
- Information Technology > Services (1.00)
- Transportation > Passenger (0.70)
- Transportation > Ground > Road (0.35)
Information Extraction from Visually Rich Documents with Font Style Embeddings
Oussaid, Ismail, Vanhuffel, William, Ratnamogan, Pirashanth, Hajaiej, Mhamed, Mathey, Alexis, Gilles, Thomas
Information extraction (IE) from documents is an intensive area of research with a large set of industrial applications. Current state-of-the-art methods focus on scanned documents with approaches combining computer vision, natural language processing and layout representation. We propose to challenge the usage of computer vision in the case where both token style and visual representation are available (i.e native PDF documents). Our experiments on three real-world complex datasets demonstrate that using token style attributes based embedding instead of a raw visual embedding in LayoutLM model is beneficial. Depending on the dataset, such an embedding yields an improvement of 0.18% to 2.29% in the weighted F1-score with a decrease of 30.7% in the final number of trainable parameters of the model, leading to an improvement in both efficiency and effectiveness.
AiThority Interview with Prince Kohli, CTO, Automation Anywhere
I've always had a passion and curiosity for technology, which started at a young age, where I was raised in India – ultimately resulting today in a fulfilling, a quarter-century-long career in Silicon Valley. It has been an exciting ride. What drew me to Automation Anywhere is our vision to create software bots that can automate repetitive, manual business tasks, and free our minds to focus on the more creative, strategic, high-level work. Before Automation Anywhere, I held leadership roles at various companies including Ericsson, running global R&D for all digital products, and before that, at Citrix – in the cloud and enterprise groups, after co-founding a security start-up that was acquired by Citrix. My role has changed as the market and the world has changed.
- North America > United States > California (0.25)
- Asia > India (0.25)
- Europe > United Kingdom > England > Tyne and Wear > Newcastle (0.05)